59 research outputs found

    Sampling from Rough Energy Landscapes

    Get PDF
    We examine challenges to sampling from Boltzmann distributions associated with multiscale energy landscapes. The multiscale features, or "roughness," corresponds to highly oscillatory, but bounded, perturbations of a smooth landscape. Through a combination of numerical experiments and analysis we demonstrate that the performance of Metropolis Adjusted Langevin Algorithm can be severely attenuated as the roughness increases. In contrast, we prove that Random Walk Metropolis is insensitive to such roughness. We also formulate two alternative sampling strategies that incorporate large scale features of the energy landscape, while resisting the impact of fine scale roughness; these also outperform Random Walk Metropolis. Numerical experiments on these landscapes are presented that confirm our predictions. Open questions and numerical challenges are also highlighted.Comment: 34 pages, first revisio

    Versification and Authorship Attribution

    Get PDF
    The technique known as contemporary stylometry uses different methods, including machine learning, to discover a poem’s author based on features like the frequencies of words and character n-grams. However, there is one potential textual fingerprint stylometry tends to ignore: versification, or the very making of language into verse. Using poetic texts in three different languages (Czech, German, and Spanish), Petr Plecháč asks whether versification features like rhythm patterns and types of rhyme can help determine authorship. He then tests its findings on two unsolved literary mysteries. In the first, Plecháč distinguishes the parts of the Elizabethan verse play The Two Noble Kinsmen written by William Shakespeare from those written by his coauthor, John Fletcher. In the second, he seeks to solve a case of suspected forgery: how authentic was a group of poems first published as the work of the nineteenth-century Russian author Gavriil Stepanovich Batenkov? This book of poetic investigation should appeal to literary sleuths the world over.illustrato

    The Corpus of Czech Verse

    Get PDF
    The article presents the Corpus of Czech Verse (i.e. a lemmatised, phonetically, morphologically, metrically and strophically annotated corpus of Czech poetry) and the online tools and frequency lists that give access to its data. The following online tools are described: Database of Czech metres – the main tool for working with the corpus data, Gunstick – a web application that serves to investigate the frequency of rhyme pairs and their historical development, Hex – an application which enables to search the Corpus of Czech Verse for texts which contain a keyword specified by the user, or to display all keywords found in the group of texts specified by the user, and Euphonometer – application which enables to quantify the degree of non-randomness of sound repetition in any text

    Metre and Semantics in the Poetry of Czech Post-Symbolists Accessed via LDA Topic Modelling

    Get PDF
    The article deals with the relationship between semantics and poetic meter in the works of Czech post-symbolist poets and their predecessors. We access the phenomena by means of a machine-driven meter recognition on one hand and LDA topic modelling on the other. We first show how the poetic groups differ in their general preferences for particular topics. Next we analyze the topic distributions in two dominant metres (i.e. iamb and trochee) across the poetic groups

    Authorship Attribution of Poetic Texts

    Get PDF
    Název práce: Atribuce autorství básnických textů Autor: Mgr. Petr Plecháč, Ph.D. Katedra: Ústav českého národního korpusu Školitel: doc. Mgr. Václav Cvrček, Ph.D. ABSTRAKT Pro rozpoznávání autorství básnických textů nabízí současná stylometrie řadu metod za- ložených na analýze pestré škály textových rysů (např. frekvence slov, frekvence zna- kových n-gramů). Jeden podstatný aspekt těchto textů ovšem zůstává stranou, a to jejich stránka versologická. Tato práce proto na čtyřech korpusech básnických textů (českých, německých, španělských a anglických) analyzuje, do jaké míry lze versologické charakte- ristiky - jako např. četnosti rytmických konfigurací nebo četnosti různých typů rýmů - využít jako indikátor autorství básnického textu. Ukazujeme, že (1) úspěšnost versolo- gických modelů vysoce převyšuje hranici random baseline, (2) ojediněle převyšuje úspěšnost obvyklých lexikálních modelů a (3) kombinované versologicko-lexikální mode- ly vykazují téměř vždy vyšší úspěšnost než jednotlivé modely samy o sobě. V další části práce jsou versologické rysy využity pro atribuci dvou textů se sporným autorstvím: (1) veršované drama The Famous History of the Life of King Henry the Eigth poprvé otištěné pod jménem Williama Shakespeara, u nějž se ovšem před-pokládá i autorská účast Johna Fletchera, příp. dalších autorů...Title: Authorship Attribution of Poetic Texts Author: Mgr. Petr Plecháč, Ph.D. Department: Institute of Czech National Corpus Supervisor: doc. Mgr. Václav Cvrček, Ph.D. ABSTRACT Contemporary stylometry offers a number of methods for authorship recognition of po- etic texts based on a variety of textual features (e.g. word frequencies, frequencies of character n-grams). However, it seems that one important aspect of these texts has been rather left aside - this aspect is versification. The thesis uses four corpora of poetic texts (Czech, German, Spanish, and English) in order to analyze to what extent versification features - such as frequencies of rhythmic patterns or frequencies of various types of rhymes - may be used as an indicator of authorship. We show that (1) versification-based models significantly outperform the random baseline, (2) in some cases versification- based models even outperform the traditionally used lexical models, (3) in most of the cases combination of both types of models outperforms the given models alone. Versifi- cation features are consequently employed for the purpose of attribution of two texts of doubted authorship: (1) the versified play The Famous History of the Life of King Henry the Eigth which was originally published under the name of William Shakespeare, but where...Ústav českého národního korpusuInstitute of the Czech National CorpusFilozofická fakultaFaculty of Art
    corecore